03 May 2018

show where your plots are
show how variables are distributed spatially
Open-source, free
Open-source, free
Workflow and reproducibility
Open-source, free
Workflow and reproducibility
Quite efficient



Geographic (or unprojected) CRS
Projected CRS
What are geographic coordinate systems?
What are projected coordinate systems?
Many, many ways to represent the 3-D shape of the earth and to project it in a 2-D plane
EPSG or a proj4stringThe EPSG code is a numeric representation of a CRS, while the proj4string reprensents the full set of parameters spelled out in a string:
EPSG4326<=> proj4+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
EPSG32188<=> proj4+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
All geographic files were created using a specific CRS - but it’s not always defined!
To find CRS in any format: Spatial Reference
sf
Why use the sf package when sp is already tried and tested?
Simple features refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers
successor of sp
sf incorporates the functionality of the 3 main packages of the sp paradigm in a single, cohesive whole:
sp for the class system;rgdal for reading and writing data;rgeos for spatial operations undertaken by GEOS).Why using sf while sp is still there and work just fine?
sf objects are easy to manipulate
st_ for easy tab completion%>%)dplyr and tidyr verbs have been defined for the sf objectsggplot2 friendlyMULTIPOINT - sample points of water quality measures in Montreal 1MULTILINESTRING - streams and rivers of Montreal 1MULTIPOLYGON - polygons of land use types 2MULTIPOLYGON - Canadian boundaries (retrieved directly from R)raster - canopy index of Montreal 2raster - altitude (retrieved directly from R)library(sf) # for simple features vector library(lwgeom) # for st_make_valid library(dplyr) # for data manipulation library(reshape2) # for data manipulation library(RColorBrewer) # for color palette library(raster) # for raster data library(mapview) # for interactive map
MULTIPOINTThis dataset defines the localisation of the sampling points for the RUISSO program. The latter consists in analyzing the bacteriological and physicochemical quality of inland streams and watercourses in Montreal.
# Create a new directory to download data
if(!dir.exists("data")) dir.create("data")
# Download csv file from web page in your working directory
if (!file.exists("data/ruisso.csv"))
download.file("http://donnees.ville.montreal.qc.ca/dataset/86843d31-4251-4002-b10d-620aaa751092/resource/adad6c48-fb48-40fc-a031-1ac870d978b4/download/scri03.-infor03.02-informatique03.02.07-donneesouvertesrsmaruissostationsstations_ruisso.csv",
destfile = "data/ruisso.csv")
# Read csv file in R
ruisso <- read.csv("data/ruisso.csv", header = T, dec = ",")
names(ruisso)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Localisation" "Administration" ## [5] "LATITUDE" "LONGITUDE" ## [7] "Activité"
sf MULTIPOINT objectruisso_sf <- st_as_sf(
x = ruisso,
coords = c("LONGITUDE", "LATITUDE"),
crs = 4326) # the CRS is given in the metadata
ruisso_sf
## Simple feature collection with 66 features and 5 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.42516 xmax: -73.50467 ymax: 45.69734 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## Plan.d.eau Point.d.échantillonnage ## 1 Rivière à l'Orme AAO-0.0 ## 2 Rivière à l'Orme AAO-1.5P1 ## 3 Rivière à l'Orme AAO-2.0P4 ## 4 Rivière à l'Orme AAO-3.3P6 ## 5 Rivière à l'Orme AAO-3.5 ## 6 Rivière à l'Orme AAO-3.6 ## 7 Rivière à l'Orme AAO-3.9P7 ## 8 Rivière à l'Orme AAO-6.4 ## 9 Rivière à l'Orme AAO-6.4P12 ## 10 Rivière à l'Orme AAO-6.5 ## Localisation ## 1 Pierrefonds, boul. Gouin O, 40m au nord de la rue de l'Anse à l'Orme, exutoire au lac des Deux Montagnes. ## 2 Pierrefonds, N ponceau boul.Gouin, 1500m en amont exutoire, branche provenant de l'est. ## 3 Ste-A.-de-Bellevue, branche drainant secteur ouest, 140m à l'est de la rue Leslie Dowker. ## 4 Kirkland, 60m au sud de l'intersection des rues de l'Anse à l'Orme et de Timberley trail, derrière le dépôt à neige. ## 5 Sainte-Anne-de-Bellevue, 10m au nord du ch. Ste-Marie, 200m à l'ouest du ch. Anse à l'Orme. ## 6 Beaconsfield, 250m à l'est de la rue Lee et 25m au sud de l'autoroute 40, en amont du pluvial. ## 7 Beaconsfield, 240m à l'est de la rue Lee et 400m au sud de l'A40, embranchement provenant de la zone boisée entourant le boul. Lakeview. ## 8 200m à l'est du Boul.Morgan et 280m au sud de la transcanadienne, affluent provenant de Ste-Anne-de-Bellevue et de la partie O et S de la zone industriel de Baie d'Urfée. ## 9 Baie d'Urfée, boul. Morgan côté est, 250m au sud de l'autoroute 40, affluent provenant des zones résidentilelles de Baie d'Urfé. ## 10 Baie d'Urfée, boul.Morgan côté ouest, 250m au sud de l'A40. ## Administration Activité geometry ## 1 Pierrefonds-Roxboro Actif POINT (-73.93704 45.45022) ## 2 Pierrefonds-Roxboro Inactif POINT (-73.91931 45.44744) ## 3 Sainte-Anne-de-Bellevue Inactif POINT (-73.91535 45.44288) ## 4 Kirkland Actif POINT (-73.90147 45.43689) ## 5 Sainte-Anne-de-Bellevue Actif POINT (-73.90144 45.43566) ## 6 Baie d'Urfé Actif POINT (-73.9012 45.43462) ## 7 Beaconsfield Inactif POINT (-73.9002 45.43105) ## 8 Baie d'Urfé Inactif POINT (-73.91347 45.42674) ## 9 Baie d'Urfé Actif POINT (-73.91549 45.42531) ## 10 Baie d'Urfé Actif POINT (-73.91565 45.42542)
MULTIPOINTInstead of creating a single map, as with sp object, the default plot of sf object creates multiple maps, one for each attribute, which can be useful for exploring the spatial distribution of different variables.
plot(ruisso_sf)
MULTIPOINTTo plot only the geometry and not all attributes, we retrieve the geometry list-column using st_geometry():
plot(st_geometry(ruisso_sf))
This dataset contains the actual water quality measurements at the RUISSO sampling points.
# Download csv file from web page in your working directory
if (!file.exists("data/donnees_ruisso_2016.csv"))
download.file("http://donnees.ville.montreal.qc.ca/dataset/8c149ace-7b2f-4041-99ec-3bdbef5dcee6/resource/38c8eb26-46a1-4fc2-87a0-5c88e94cee8e/download/donnees_ruisso_2016.csv",
destfile = "data/ruisso_data.csv")
# Read csv file in R
ruisso_data <- read.csv("data/ruisso_data.csv", header = T, dec = ",")
head(ruisso_data)
## Point.d.échantillonnage Date Raison.d.annulation X.OD O2..mg.L. ## 1 AAO-0.0 2016/05/12 110.0 10.7 ## 2 AAO-0.0 2016/06/01 74.0 7.4 ## 3 AAO-0.0 2016/06/22 72.0 6.7 ## 4 AAO-0.0 2016/08/02 79.0 6.8 ## 5 AAO-0.0 2016/08/22 79.0 7.3 ## 6 AAO-0.0 2016/09/20 76.8 7.1 ## Conductivité..µs.cm2. pH Température..oC. Signe.COLI COLI MÉTÉO ## 1 514 7.9 16.9 < 10 1 ## 2 1434 7.9 15.0 = 54 1 ## 3 1474 7.7 19.1 = 230 0 ## 4 1346 7.9 22.3 = 300 1 ## 5 793 7.8 19.1 = 550 -1 ## 6 1286 7.7 18.8 = 72 -2 ## Ag..µg.L. Al..µg.L. As..µg.L. Ba..µg.L. Be..µg.L. Ca..µg.L. Cd..µg.L. ## 1 0.1 214 0.4 37 0.1 44200 0.1 ## 2 0.1 60 0.5 69 0.1 109000 0.1 ## 3 0.1 761 0.9 58 0.1 104000 0.1 ## 4 0.1 214 0.8 67 0.1 98200 0.1 ## 5 0.1 353 0.5 56 0.1 71200 0.1 ## 6 0.1 207 0.5 61 0.1 102000 0.1 ## Co..µg.L. COT..µg.L. Cr..µg.L. Cu..µg.L. Fe..µg.L. K..µg.L. Mg..µg.L. ## 1 0.2 7.2 0.6 1.9 364 1970 15600 ## 2 0.2 6.4 0.3 3.2 271 4190 37100 ## 3 0.8 6.1 2.2 4.3 1200 4600 39300 ## 4 0.3 15.1 0.5 1.6 393 4180 36000 ## 5 0.4 4.3 1.3 3.0 533 3150 19500 ## 6 0.2 5.3 0.7 3.2 367 4100 33700 ## Mn..µg.L. Mo..µg.L. Na..µg.L. NH3..µg.L. Ni..µg.L. Ptot..µg.L. Pb..µg.L. ## 1 40.4 1.5 46600 30 1.2 37 0.5 ## 2 70.3 4.1 100000 160 2.3 33 0.2 ## 3 82.8 4.1 100000 290 3.0 129 3.2 ## 4 65.1 3.9 100000 90 2.0 67 0.9 ## 5 44.2 2.9 58200 120 2.3 55 0.8 ## 6 18.9 3.3 100000 40 2.7 31 0.7 ## MES...mg.L. Sb..µg.L. Se..µg.L. Sn2.ug.L Tl.ug.L U..µg.L. V..µg.L. ## 1 6.8 0.5 0.5 1 0.2 0.8 0.9 ## 2 5.7 0.5 0.5 1 0.2 1.7 0.6 ## 3 35.3 0.5 0.5 1 0.2 1.8 2.8 ## 4 9.0 0.5 0.5 1 0.2 1.8 1.7 ## 5 10.3 0.5 0.5 1 0.2 1.2 1.7 ## 6 3.6 0.5 0.5 1 0.2 1.3 1.2 ## Zn..µg.L. ## 1 7 ## 2 7 ## 3 12 ## 4 7 ## 5 7 ## 6 7

Combining data from different sources is a common task in data preparation. left_join() from dplyr do this by combining tables based on a shared ‘key’ variable.
See dplyr and tidyr cheatsheet for other join functions.
ruisso_sf <- left_join(ruisso_sf, ruisso_data, by = "Point.d.échantillonnage") names(ruisso_sf)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Localisation" "Administration" ## [5] "Activité" "Date" ## [7] "Raison.d.annulation" "X.OD" ## [9] "O2..mg.L." "Conductivité..µs.cm2." ## [11] "pH" "Température..oC." ## [13] "Signe.COLI" "COLI" ## [15] "MÉTÉO" "Ag..µg.L." ## [17] "Al..µg.L." "As..µg.L." ## [19] "Ba..µg.L." "Be..µg.L." ## [21] "Ca..µg.L." "Cd..µg.L." ## [23] "Co..µg.L." "COT..µg.L." ## [25] "Cr..µg.L." "Cu..µg.L." ## [27] "Fe..µg.L." "K..µg.L." ## [29] "Mg..µg.L." "Mn..µg.L." ## [31] "Mo..µg.L." "Na..µg.L." ## [33] "NH3..µg.L." "Ni..µg.L." ## [35] "Ptot..µg.L." "Pb..µg.L." ## [37] "MES...mg.L." "Sb..µg.L." ## [39] "Se..µg.L." "Sn2.ug.L" ## [41] "Tl.ug.L" "U..µg.L." ## [43] "V..µg.L." "Zn..µg.L." ## [45] "geometry"
Keep only active sites
ruisso_sf1 <- filter(ruisso_sf, Activité == "Actif") # same as # ruisso_sf1 <- filter(ruisso_sf, Activité != "Inactif") dim(ruisso_sf)
## [1] 361 45
dim(ruisso_sf1)
## [1] 345 45
Remove unwanted variables
ruisso_sf2 <- dplyr::select(ruisso_sf1,
-Localisation,
-Activité,
-Raison.d.annulation,
-Signe.COLI,
-MÉTÉO)
Note:
rasteranddplyrpackages have a functionselect(). To avoid an error message when both packages are loaded, we explicitly use the namespace of the package :dplyr::select().
names(ruisso_sf2)
## [1] "Plan.d.eau" "Point.d.échantillonnage" ## [3] "Administration" "Date" ## [5] "X.OD" "O2..mg.L." ## [7] "Conductivité..µs.cm2." "pH" ## [9] "Température..oC." "COLI" ## [11] "Ag..µg.L." "Al..µg.L." ## [13] "As..µg.L." "Ba..µg.L." ## [15] "Be..µg.L." "Ca..µg.L." ## [17] "Cd..µg.L." "Co..µg.L." ## [19] "COT..µg.L." "Cr..µg.L." ## [21] "Cu..µg.L." "Fe..µg.L." ## [23] "K..µg.L." "Mg..µg.L." ## [25] "Mn..µg.L." "Mo..µg.L." ## [27] "Na..µg.L." "NH3..µg.L." ## [29] "Ni..µg.L." "Ptot..µg.L." ## [31] "Pb..µg.L." "MES...mg.L." ## [33] "Sb..µg.L." "Se..µg.L." ## [35] "Sn2.ug.L" "Tl.ug.L" ## [37] "U..µg.L." "V..µg.L." ## [39] "Zn..µg.L." "geometry"
Rename variables
ruisso_sf3 <- rename(ruisso_sf2, river = Plan.d.eau, sample_pts = Point.d.échantillonnage, dissolve_O = X.OD) names(ruisso_sf3)
## [1] "river" "sample_pts" ## [3] "Administration" "Date" ## [5] "dissolve_O" "O2..mg.L." ## [7] "Conductivité..µs.cm2." "pH" ## [9] "Température..oC." "COLI" ## [11] "Ag..µg.L." "Al..µg.L." ## [13] "As..µg.L." "Ba..µg.L." ## [15] "Be..µg.L." "Ca..µg.L." ## [17] "Cd..µg.L." "Co..µg.L." ## [19] "COT..µg.L." "Cr..µg.L." ## [21] "Cu..µg.L." "Fe..µg.L." ## [23] "K..µg.L." "Mg..µg.L." ## [25] "Mn..µg.L." "Mo..µg.L." ## [27] "Na..µg.L." "NH3..µg.L." ## [29] "Ni..µg.L." "Ptot..µg.L." ## [31] "Pb..µg.L." "MES...mg.L." ## [33] "Sb..µg.L." "Se..µg.L." ## [35] "Sn2.ug.L" "Tl.ug.L" ## [37] "U..µg.L." "V..µg.L." ## [39] "Zn..µg.L." "geometry"
Remove symbols from column names:
names(ruisso_sf3) <- gsub("\\..*", "", names(ruisso_sf3))
names(ruisso_sf3)
## [1] "river" "sample_pts" "Administration" "Date" ## [5] "dissolve_O" "O2" "Conductivité" "pH" ## [9] "Température" "COLI" "Ag" "Al" ## [13] "As" "Ba" "Be" "Ca" ## [17] "Cd" "Co" "COT" "Cr" ## [21] "Cu" "Fe" "K" "Mg" ## [25] "Mn" "Mo" "Na" "NH3" ## [29] "Ni" "Ptot" "Pb" "MES" ## [33] "Sb" "Se" "Sn2" "Tl" ## [37] "U" "V" "Zn" "geometry"
Remove accents from column names
names(ruisso_sf3) <- gsub("é", "e", names(ruisso_sf3))
names(ruisso_sf3)
## [1] "river" "sample_pts" "Administration" "Date" ## [5] "dissolve_O" "O2" "Conductivite" "pH" ## [9] "Temperature" "COLI" "Ag" "Al" ## [13] "As" "Ba" "Be" "Ca" ## [17] "Cd" "Co" "COT" "Cr" ## [21] "Cu" "Fe" "K" "Mg" ## [25] "Mn" "Mo" "Na" "NH3" ## [29] "Ni" "Ptot" "Pb" "MES" ## [33] "Sb" "Se" "Sn2" "Tl" ## [37] "U" "V" "Zn" "geometry"
There are multiple measurements during the summer.
ruisso_sf3
## Simple feature collection with 345 features and 39 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.42516 xmax: -73.50467 ymax: 45.69734 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## river sample_pts Administration Date dissolve_O ## 1 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/05/12 110.0 ## 2 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/06/01 74.0 ## 3 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/06/22 72.0 ## 4 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/08/02 79.0 ## 5 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/08/22 79.0 ## 6 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/09/20 76.8 ## 7 Rivière à l'Orme AAO-0.0 Pierrefonds-Roxboro 2016/10/25 96.9 ## 8 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/05/12 110.0 ## 9 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/06/01 81.0 ## 10 Rivière à l'Orme AAO-3.3P6 Kirkland 2016/06/22 93.0 ## O2 Conductivite pH Temperature COLI Ag Al As Ba Be Ca Cd ## 1 10.7 514 7.9 16.9 10 0.1 214 0.4 37 0.1 44200 0.1 ## 2 7.4 1434 7.9 15.0 54 0.1 60 0.5 69 0.1 109000 0.1 ## 3 6.7 1474 7.7 19.1 230 0.1 761 0.9 58 0.1 104000 0.1 ## 4 6.8 1346 7.9 22.3 300 0.1 214 0.8 67 0.1 98200 0.1 ## 5 7.3 793 7.8 19.1 550 0.1 353 0.5 56 0.1 71200 0.1 ## 6 7.1 1286 7.7 18.8 72 0.1 207 0.5 61 0.1 102000 0.1 ## 7 11.0 1414 7.7 9.4 99 0.1 259 0.4 77 0.1 122000 0.1 ## 8 11.6 1589 7.9 12.9 1200 0.1 81 0.3 60 0.1 125000 0.1 ## 9 8.1 1639 8.0 15.6 4400 0.1 95 0.4 56 0.1 123000 0.1 ## 10 9.2 897 7.9 15.6 33000 0.1 2600 1.1 63 0.1 72700 0.1 ## Co COT Cr Cu Fe K Mg Mn Mo Na NH3 Ni Ptot Pb ## 1 0.2 7.2 0.6 1.9 364 1970 15600 40.4 1.5 46600 30 1.2 37 0.5 ## 2 0.2 6.4 0.3 3.2 271 4190 37100 70.3 4.1 100000 160 2.3 33 0.2 ## 3 0.8 6.1 2.2 4.3 1200 4600 39300 82.8 4.1 100000 290 3.0 129 3.2 ## 4 0.3 15.1 0.5 1.6 393 4180 36000 65.1 3.9 100000 90 2.0 67 0.9 ## 5 0.4 4.3 1.3 3.0 533 3150 19500 44.2 2.9 58200 120 2.3 55 0.8 ## 6 0.2 5.3 0.7 3.2 367 4100 33700 18.9 3.3 100000 40 2.7 31 0.7 ## 7 0.3 4.2 0.9 3.1 422 5590 35900 24.7 3.8 100000 70 2.4 36 0.6 ## 8 0.2 3.8 0.3 3.3 237 4070 36300 57.2 2.3 100000 60 2.0 23 0.2 ## 9 0.2 3.7 0.3 8.0 258 4840 35400 68.3 3.1 100000 100 2.1 207 0.2 ## 10 2.7 23.6 9.9 30.0 3950 4790 20900 261.0 2.6 92300 490 7.6 430 5.6 ## MES Sb Se Sn2 Tl U V Zn geometry ## 1 6.8 0.5 0.5 1 0.2 0.8 0.9 7 POINT (-73.93704 45.45022) ## 2 5.7 0.5 0.5 1 0.2 1.7 0.6 7 POINT (-73.93704 45.45022) ## 3 35.3 0.5 0.5 1 0.2 1.8 2.8 12 POINT (-73.93704 45.45022) ## 4 9.0 0.5 0.5 1 0.2 1.8 1.7 7 POINT (-73.93704 45.45022) ## 5 10.3 0.5 0.5 1 0.2 1.2 1.7 7 POINT (-73.93704 45.45022) ## 6 3.6 0.5 0.5 1 0.2 1.3 1.2 7 POINT (-73.93704 45.45022) ## 7 5.8 0.5 0.5 1 0.2 2.2 1.2 7 POINT (-73.93704 45.45022) ## 8 3.4 0.5 0.5 1 0.2 1.9 0.7 7 POINT (-73.90147 45.43689) ## 9 5.0 0.5 0.5 1 0.2 1.6 1.0 7 POINT (-73.90147 45.43689) ## 10 161.0 1.2 0.5 1 0.2 0.9 9.4 108 POINT (-73.90147 45.43689)
We will use the mean of water quality measurements.
ruisso_mean <- ruisso_sf3 %>% group_by(sample_pts) %>% # split the data into groups of samples summarise_at(vars(O2:Zn), mean, na.rm = TRUE) # compute the mean by group on selected variables ruisso_mean
## Simple feature collection with 50 features and 35 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -73.93704 ymin: 45.42516 xmax: -73.50467 ymax: 45.69734 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 50 x 36 ## sample_pts O2 Conductivite pH Temperature COLI Ag Al ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 AAO-0.0 8.14 1180. 7.8 17.2 188. 0.1 295. ## 2 AAO-3.3P6 9.5 1378. 7.89 16.4 18599. 0.1 432. ## 3 AAO-3.5 8.79 1281. 7.74 14.3 612. 0.1 150. ## 4 AAO-3.6 9 1128. 7.97 16.1 461 0.1 217. ## 5 AAO-6.4P12 10.1 1461. 7.96 18.2 147 0.1 33.1 ## 6 AAO-6.5 7.4 567. 8.12 16.5 1219. 0.1 109. ## 7 ADM-1 9.97 333. 8.19 21.2 58.6 0.1 119 ## 8 ANG-2 9.7 399. 8.33 20.2 41 0.1 60.7 ## 9 BER-0.0 7.18 803. 7.61 17.1 305. 0.1 140. ## 10 BER-0.7P1 9.25 751. 7.76 17.1 1431. 0.1 75.7 ## # ... with 40 more rows, and 28 more variables: As <dbl>, Ba <dbl>, ## # Be <dbl>, Ca <dbl>, Cd <dbl>, Co <dbl>, COT <dbl>, Cr <dbl>, Cu <dbl>, ## # Fe <dbl>, K <dbl>, Mg <dbl>, Mn <dbl>, Mo <dbl>, Na <dbl>, NH3 <dbl>, ## # Ni <dbl>, Ptot <dbl>, Pb <dbl>, MES <dbl>, Sb <dbl>, Se <dbl>, ## # Sn2 <dbl>, Tl <dbl>, U <dbl>, V <dbl>, Zn <dbl>, geometry <POINT [°]>
plot(ruisso_mean)
myPal <- colorRampPalette(c("blue", "red"))
plot(ruisso_mean["Temperature"],
pal = myPal, nbreaks = 30, pch = 19, key.pos = 1,
main = "Water temperature in streams of Montreal")
MULTIPOINT to a ShapefileWe can write a simple features object to a file (e.g. a shapefile) using the st_write() function in sf (see vignette)
st_write(ruisso_mean, dsn = "data/ruisso.shp", driver = "ESRI Shapefile", delete_dsn = T)
## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ## ESRI Shapefile driver
Some argument specifications depend on the driver, see
?st_write
MULTIPOLYGONWe will now import a land use vector map of Montreal.
Because the original dataset was composed of multiple individual shapefiles and were long to download, these manipulations should have been performed before the workshop using this R script.
Don’t run these lines now!
# Download shapefiles
download.file("http://cmm.qc.ca/fileadmin/user_upload/geomatique/UtilisationDuSol/2016_Shapefiles/660-US-2016.zip", destfile = "data/landuse.zip")
# Unzip the main folder and name it "landuse"
unzip("landuse.zip", exdir="data/landuse")
# get all the zip files inside the main folder "landuse"
zipF <- list.files(path = "data/landuse/", pattern = "*.zip", full.names = TRUE)
# unzip all your files
plyr::ldply(.data = zipF, .fun = unzip, exdir = "data/landuse")
Don’t run these lines now!
# Get the names of the land use shapefiles from the folder "landuse"
shp_files <- list.files(path = "data/landuse", pattern = ".shp")
shp_files <- sub(".shp", "", shp_files)
# Read all the shapefiles
LU <- list()
for(f in shp_files) {
LU[[f]] <- st_read(dsn = "data/landuse/", layer = f)
}
# Combine all shapefiles together
LU_all <- do.call(rbind, LU)
# Write as a GeoPackage
st_write(LU_all, "data/LU_all.gpkg", driver = "GPKG")
# Read GeoPackage in R LU <- st_read(dsn = "data/LU_all.gpkg")
## Reading layer `LU_all' from data source `/home/kevcaz/Github/Others/Rspatial/docs/data/LU_all.gpkg' using driver `GPKG' ## Simple feature collection with 107233 features and 38 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: 265961.2 ymin: 5027324 xmax: 306814.6 ymax: 5063078 ## epsg (SRID): NA ## proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
Notice that the EPSG code is not defined
To define the CRS when it’s missing we can use st_set_crs().
Note however that replacing crs does not re-project data. If we wanted to transform coordinate and re-project them, we would use st_transform().
LU <- st_set_crs(LU, 32188)
MULTIPOLYGONWe can map only one land use at a time by subsetting first the sf object. UTIL_SOL==1000 represents water. Don’t try to plot everything… it would take forever!
plot(st_geometry(subset(LU, UTIL_SOL==1000)))
Geometry validity refers to essential properties of polygons, such as non-self intersecting, holes being inside polygons, more than 4 points, closing segments. Invalid geometry can be ignored for mapping but cause problem for spatial operations.
st_is_valid() returns a logical vector indicating for each polygon geometries indicating whether it is topologically valid:
head(invalid_poly <- which(!st_is_valid(LU)))
## [1] 862 1149 1251 1406 2071 2119
Geometry validity refers to essential properties of polygons, such as non-self intersecting, holes being inside polygons, more than 4 points, closing segments. Invalid geometry can be ignored for mapping but cause problem for spatial operations.
Using the argument reason = TRUE returns the reason for invalidity:
st_is_valid(LU[invalid_poly,], reason = TRUE)[1:8]
## [1] "Nested shells[295499.0604 5045532.3412]" ## [2] "Nested shells[298395.5742 5045168.6803]" ## [3] "Self-intersection[297511.205 5032954.7058]" ## [4] "Nested shells[273102.34356 5035592.911633]" ## [5] "Nested shells[274053.456247 5036041.908298]" ## [6] "Nested shells[288581.300144 5036823.875472]" ## [7] "Self-intersection[294414.9683 5031596.2768]" ## [8] "Self-intersection[295265.9577 5045076.9297]"
par(mfrow = c(1,2)) plot(st_geometry(LU[862,]), main = "Nested shells") plot(st_geometry(LU[1251,]), main = "Self-intersection")
We can use st_make_valid() from lwgeom to make an invalid geometry valid
LU_val <- st_make_valid(LU)
# let's verify if all geometries are now valid which(!st_is_valid(LU_val)) # yeah!
## integer(0)

st_read(dsn = path_to_file, layer = file_name, driver = "ESRI Shapefile") and name it courseau
If you did not load the shapefile before the workshop you can download it using
download.file()from this link : http://donnees.ville.montreal.qc.ca/dataset/c128aff5-325c-4599-ab66-1c9d0b3abc94/resource/a37e11d4-f0a3-46a7-8636-76754fad72b3/download/courseau.zip
Create a simple map of the geometry
TYPE. You can change the default colors using the argument pal.
courseau
## Simple feature collection with 1306 features and 5 fields ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: -73.97268 ymin: 45.41593 xmax: -73.4975 ymax: 45.69939 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## OBJECTID_1 NOM TYPE Shape_Le_1 NuméroRui ## 1 1 rivière à l'Orme rivière 177.95299 <NA> ## 2 2 rivière à l'Orme rivière 128.51146 <NA> ## 3 3 <NA> fossé 172.42988 <NA> ## 4 4 rivière à l'Orme rivière 216.66838 <NA> ## 5 8 <NA> fossé 540.29539 <NA> ## 6 9 <NA> ruisseau 97.66412 <NA> ## 7 10 <NA> ruisseau 162.30584 <NA> ## 8 11 <NA> ruisseau 210.75794 <NA> ## 9 14 <NA> ruisseau 608.28651 <NA> ## 10 16 <NA> fossé 267.61108 <NA> ## geometry ## 1 MULTILINESTRING ((-73.9107 ... ## 2 MULTILINESTRING ((-73.90824... ## 3 MULTILINESTRING ((-73.90667... ## 4 MULTILINESTRING ((-73.91472... ## 5 MULTILINESTRING ((-73.91029... ## 6 MULTILINESTRING ((-73.9206 ... ## 7 MULTILINESTRING ((-73.91969... ## 8 MULTILINESTRING ((-73.91727... ## 9 MULTILINESTRING ((-73.91727... ## 10 MULTILINESTRING ((-73.91212...
rasterraster classesraster provides three main classes of raster object:
RasterLayer - a single-layer (variable) raster (e.g. elevation)
RasterStack - one single object several single-layer (variable) rasters stored in one or different files (e.g. Worldclim bioclimatic variables)
RasterBrick one single object several single-layer (variable) rasters stored in one single file (e.g. a single multispectral satellite file)
We now import raster data use a .tif file of a canopy index from Montreal.
# Download tif file from web page in your working directory
if (!file.exists("data/canopee.zip")){
download.file("http://cmm.qc.ca/fileadmin/user_upload/geomatique/IndiceCanopee/2015/660_ Canopee2015_3m.zip", destfile = "data/canopee.zip") }
unzip("data/canopee.zip", exdir = "data")
# Read tif in R using raster
# The file named "660_CLASS_3m.tif" contains the canopy index for all the Montreal area, so we can read this file only
canopee_mtl <- raster("data/660_CLASS_3m.tif")
rastercanopee_mtl
## class : RasterLayer ## dimensions : 35754, 40854, 1460693916 (nrow, ncol, ncell) ## resolution : 1, 1 (x, y) ## extent : 265961, 306815, 5027324, 5063078 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## data source : /home/kevcaz/Github/Others/Rspatial/docs/data/660_CLASS_3m.tif ## names : X660_CLASS_3m ## values : 1, 5 (min, max)
The canopy index dataset is a RasterLayer with values from 1 to 5, 35754 pixels by row and 40854 pixels by column and resolution of 1m x 1m.
rasterSimilar to the sf package, raster also provides plot methods for its own classes.
plot(canopee_mtl)
getDataIn package raster, getData() function generates requests to access to different spatial datasets (either rasteror sp objects). Argument name specifies the dataset you wish to download.
GADM- global administrative boundaries at different level of administrative subdivision
worldclim- global interpolated climate data
altandSTRM- coarse and fine resolution elevation data
ISO3- 3 letter ISO codes for country names.
head(getData("ISO3"))
## ISO3 NAME ## 1 ABW Aruba ## 2 AFG Afghanistan ## 3 AGO Angola ## 4 AIA Anguilla ## 5 ALA Åland ## 6 ALB Albania
getDataRetrieve a raster of altitude for Canada.
# alt90 <- getData('SRTM', lon = -73.7, lat = 45.5) # Fine resolution
altCAN <- getData(name = "alt", country = "CAN", path = "data/") # Coarse resolution
altCAN
## class : RasterLayer ## dimensions : 4992, 10620, 53015040 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -141.1, -52.6, 41.6, 83.2 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : /home/kevcaz/Github/Others/Rspatial/docs/data/CAN_msk_alt.grd ## names : CAN_msk_alt ## values : -108, 5800 (min, max)
Because the resolution of the altitude raster (0.0083x0.0083° = 650x926m) is a lot coarser than that of the canopy index (1x1m), we will use the altitude raster for examples of raster manipulations to reduce computation time.
getDataRetrieve a raster of altitude for Canada.
plot(altCAN)

getData() to retrieve Canadian boundary map at the provincial level
?getDataTransform it to a sf object using st_as_sf()
Subset the Quebec boundary in the column NAME_1
Re-project your new object using the same projection that of the land use polygons (st_crs(LU_val) or with the EPSG code 32188)
Try to plot the geometry of the Quebec polygon.
Try qc_simple_prj <- st_simplify(qc_prj, dTolerance = 500, preserveTopology = F) and plot the geometry of this new object. Is there a difference?
plot(st_geometry(qc_simple_prj))
sfruisso_buf <- st_buffer(st_geometry(ruisso_mean), dist = 0.01)
## Warning in st_buffer.sfc(st_geometry(ruisso_mean), dist = 0.01): st_buffer ## does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
dist()is assumed to be in decimal degrees` This message indicates that sf assumes a distance value is given in degrees
st_buffer()does not correctly buffer longitude/latitude dataThis warning indicates that the result may be incorrect becausest_buffer` expects coordinates in a 2-D, Euclidean space, which is not the case for longitude latitude data. So we should re-project the data onto a projected CRS.
plot(st_geometry(ruisso_mean), pch = 19, cex= .8) plot(ruisso_buf, add = T, border= "blue3")
st_transform()We can re-project the sample points using a suitable projection that has units of meters. To do this, we will use the projection from our land use polygons.
(ruisso_proj <- st_transform(ruisso_mean, crs = st_crs(LU_val)))
## Simple feature collection with 50 features and 35 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: 270615.3 ymin: 5031758 xmax: 304436.2 ymax: 5061940 ## epsg (SRID): 32188 ## proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## # A tibble: 50 x 36 ## sample_pts O2 Conductivite pH Temperature COLI Ag Al ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 AAO-0.0 8.14 1180. 7.8 17.2 188. 0.1 295. ## 2 AAO-3.3P6 9.5 1378. 7.89 16.4 18599. 0.1 432. ## 3 AAO-3.5 8.79 1281. 7.74 14.3 612. 0.1 150. ## 4 AAO-3.6 9 1128. 7.97 16.1 461 0.1 217. ## 5 AAO-6.4P12 10.1 1461. 7.96 18.2 147 0.1 33.1 ## 6 AAO-6.5 7.4 567. 8.12 16.5 1219. 0.1 109. ## 7 ADM-1 9.97 333. 8.19 21.2 58.6 0.1 119 ## 8 ANG-2 9.7 399. 8.33 20.2 41 0.1 60.7 ## 9 BER-0.0 7.18 803. 7.61 17.1 305. 0.1 140. ## 10 BER-0.7P1 9.25 751. 7.76 17.1 1431. 0.1 75.7 ## # ... with 40 more rows, and 28 more variables: As <dbl>, Ba <dbl>, ## # Be <dbl>, Ca <dbl>, Cd <dbl>, Co <dbl>, COT <dbl>, Cr <dbl>, Cu <dbl>, ## # Fe <dbl>, K <dbl>, Mg <dbl>, Mn <dbl>, Mo <dbl>, Na <dbl>, NH3 <dbl>, ## # Ni <dbl>, Ptot <dbl>, Pb <dbl>, MES <dbl>, Sb <dbl>, Se <dbl>, ## # Sn2 <dbl>, Tl <dbl>, U <dbl>, V <dbl>, Zn <dbl>, geometry <POINT [m]>
ruisso_buf <- st_buffer(st_geometry(ruisso_proj), dist = 500) # add an attribute for sample points id ruisso_buf <- st_sf(sample_pts = ruisso_mean$sample_pts, ruisso_buf) plot(st_geometry(ruisso_proj), pch = 19, cex= .5) plot(st_geometry(ruisso_buf), add = T, border= "blue3")

Re-project the courseau object using the land use projection and name it courseau_proj
Create a 300m buffer around each watercourse and name it courseau_buf. Plot the resulting geometry with courseau_proj on top.
Try st_union() on courseau_buf. Plot the resulting geometry and compare with courseau_buf.
Try st_centroid() on ruisso_buf. Plot the resulting geometry with ruisso_buf under.
For more geometric transformation such
st_difference(),st_convex_hull(),st_intersection()see sf vignette #3
LU_reclas <- LU_val %>%
dplyr::select(ID, UTIL_SOL) %>%
mutate(UTIL_SOL = case_when(UTIL_SOL %in% c(100:114) ~ "resid",
UTIL_SOL %in% c(200:520) ~ "public_build",
UTIL_SOL == 600 ~ "green",
UTIL_SOL %in% c(700:760) ~ "road",
UTIL_SOL == 800 ~ "agri",
UTIL_SOL == 900 ~ "vacant",
UTIL_SOL == 1000 ~ "water",
UTIL_SOL == 1100 ~ "golf"))
case_whenallows you to vectorize multiple if and else if statements
Now we aim at finding the proportion of area per buffer occupied by different land-use types. st_intersection() can be used to obtain the intersection of two geometries (i.e. the area covered by both).
LU_inters <- st_intersection(LU_reclas, ruisso_buf)
## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries
Warning: attribute variables are assumed to be spatially constantAttribute values are assigned to sub-geometries; if these are spatially constant, as for instance for land use, then this is fine. If they are aggregates, such as population count, then this is wrong.
plot(LU_inters["UTIL_SOL"])
# add in a column and compute area (area of each land use poly in intersect layer) LU_inters$areaLU <- st_area(LU_inters) # group data by sample points and land use type and calculate the total area for each land use per buffer LU_area <- LU_inters %>% group_by(sample_pts, UTIL_SOL) %>% summarise(areaLU = sum(areaLU)) # remove geometry st_geometry(LU_area) <- NULL head(LU_area)
## # A tibble: 6 x 3 ## # Groups: sample_pts [1] ## sample_pts UTIL_SOL areaLU ## <fct> <chr> <S3: units> ## 1 AAO-0.0 agri 1211.9344357606 ## 2 AAO-0.0 green 353192.096990321 ## 3 AAO-0.0 public_build 5908.16970410941 ## 4 AAO-0.0 resid 29445.8119433945 ## 5 AAO-0.0 road 38497.7872294817 ## 6 AAO-0.0 vacant 110014.00807651
# buffer with 500m radius so area is pi*500^2 # try st_area(ruisso_buf) # compute proportion for each land use LU_area$propLU <- as.numeric(LU_area$areaLU/(pi*500^2)) # transform to wide format and create a new data frame containing our new landscape variables env_var <- dcast(data = LU_area, formula = sample_pts ~ UTIL_SOL, fill = 0)
## Using propLU as value column: use value.var to override.
rastercrop() will decrease the extent of a raster using the extent of Montreal area.
extMtl <- extent(c(xmin=-73.97342, xmax=-73.47562, ymin=45.39904, ymax=45.73252)) alt_crop <- crop(altCAN, extMtl) # Crop the raster
crop() will decrease the extent of the raster using the extent of Montreal area.
par(mfrow=c(1,2)) plot(altCAN) plot(alt_crop)
The projection of alt_crop is in longitude/latitude.
alt_crop
## class : RasterLayer ## dimensions : 40, 60, 2400 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -73.975, -73.475, 45.4, 45.73333 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : in memory ## names : CAN_msk_alt ## values : 0, 179 (min, max)
We can use projectRaster() to transform the CRS of one spatial object to match another spatial object
st_crs(LU_val)[[2]] # to retrieve the proj4string
## [1] "+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
alt_proj <- projectRaster(alt_crop, crs = st_crs(LU_val)[[2]]) alt_proj
## class : RasterLayer ## dimensions : 48, 72, 3456 (nrow, ncol, ncell) ## resolution : 650, 926 (x, y) ## extent : 263713, 310513, 5025305, 5069753 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ## data source : in memory ## names : CAN_msk_alt ## values : 0.9513865, 173.2387 (min, max)
Compute the mean altitude per buffer
sample_alt <- raster::extract(alt_proj, as(ruisso_buf, "Spatial"), fun=mean, na.rm=TRUE) head(sample_alt)
## [,1] ## [1,] 23.51085 ## [2,] 28.15839 ## [3,] 28.15839 ## [4,] 28.22916 ## [5,] 32.50139 ## [6,] 30.12374
# Add this new environmental variables to our data frame env_var <- cbind.data.frame(env_var, altitude = sample_alt)
See the package
veloxfor faster raster extraction
Now that we have a clean data frame with water quality variables (mean measurements of water quality in different watercourses) and a data frame of landscape variables of interest (% of different land use types in a 500m buffer and mean altitude in a 500m buffer), we could perform various statistical analyses.
For instance, we could run a Redundancy Analysis (RDA) to study the influence of the landscape on water quality. If we had data on macroinvertebrate abundance, we could study the relative importance of the landscape and water quality on their distribution using variation partitioning on RDA.
plot() allows combination of multiple layers of information in a same graph using add = T
plot(canopee_mtl) plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T)
Change the color palette for the raster levels. There are 5 levels and only 2 are pertinent: - 3 = low vegetation cover - 4 = canopy (see here for info on the classification).
myPal <- c("white", "white", "#BAE4B3", "#006D2C", "white") # color palette
plot(canopee_mtl, col = myPal, legend = F) # canopy raster
plot(st_geometry(subset(LU_val, UTIL_SOL==1000)), # UTIL_SOL==1000 -> rivers
col = "#7eb0fc", border = "#7eb0fc",add=T)
plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) # watercourse
plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T) # sample points
legend("bottomright", legend = c("Low vegetation", "Canopy"), # legend
fill = c("#BAE4B3", "#006D2C"), border="white", bty = "n")
Add an inset
par(mar=c(3,3,2,0))
plot(canopee_mtl, col = myPal, legend = F) # canopy raster
plot(st_geometry(subset(LU_val, UTIL_SOL == 1000)), # UTIL_SOL==1000 -> rivers
col = "#7eb0fc", border = "#7eb0fc", add = T)
plot(st_geometry(courseau_proj), col = "#034ebf", lwd = 1.5, add = T) # watercourse
plot(st_geometry(ruisso_proj), pch = 19, col = "#B00C0C", add = T) # sample points
legend("bottomright", legend = c("Low vegetation", "Canopy"), # legend
fill = c("#BAE4B3", "#006D2C"), border="white", bty = "n")
par(fig = c(0.08, 0.6, 0.42, 1), new = T) # add inset at specified coordinates of the figure region
plot(st_geometry(qc_simple_prj), col = "grey85", bgc = "transparent") # Quebec polygon
points(286543, 5038936, pch = 17, col = "#B00C0C") # point on Montreal
mapviewJoin our environmental variables data frame as attributes to the sf sample points
ruisso_env <- left_join(ruisso_proj, env_var, by = "sample_pts")
## Warning: Column `sample_pts` joining character vector and factor, coercing ## into character vector
mapview(ruisso_env, zcol = 'COLI', legend=TRUE, layer.name = "E. coli density")
mapview(ruisso_env, cex = "public_build", map.types = "Esri.WorldImagery")
myPal <- colorRampPalette(brewer.pal(9, "YlGnBu"))
mapview(ruisso_env, zcol = "resid", cex = "COLI", legend = TRUE, col.regions = myPal,
layer.name = "% of residential area")

Good tutorials for spatial data in R
sf manipulations
Maps in R
Get free data